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Abstract — In recent years application of intelligent methods has been considered in forecasting hydrologic 
processes. In this research, month river discharge of kakareza, a river located in lorestan province at the 
west of Iran, was forecasted using Support vector machine and as genetic programming Inference System 
methods in dehno stations. In this regard, some different combinations in the period (1979-2015) as input 
data for estimation of discharge in the month index were evaluated. Criteria of correlation coefficient, root 
mean square error and Nash Sutcliff coefficient to evaluate and compare the performance of methods were 
used. It showed that combined structure by using surveyed inelegant methods, resulted to an acceptable 
estimation of discharge to the kakareza river. In addition comparison between models shows that Support 
vector machine has a better performance than other models in inflow estimation. In terms of accuracy, 
Support vector machine with correlation coefficients ( 0.970 ) has more propriety than root mean square 
error (0.08m 3 /s ) and Nash Sutcliff ( 0.94 ) . To sum up, it is mentioned that Support vector machine method 
has a better capability to estimate the minimum, maximum and other flow values. 
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I. INTRODUCTION 1 

Nowadays one of the most important issues for managing 2 
flood and preventing the economic and physical damage 3 
caused by it, are correctly prediction the river flows. 4 
Accurate estimates of inflow to reservoirs could play an 5 
important role in the planning and management of water 6 
resources. But factors and various effects that have an 7 
influence on this phenomenon that analysis makes difficult. 8 
The statistical Models and the regression models are the 9 
most commonly analytical techniques that frequently 10 
according to a linear resolution of these phenomena 11 
presented results along with error and cannot model with 12 
acceptable accuracy temporal changes the phenomenon. So 13 
choose a model that could using affective factors, estimates 14 
acceptable the input current seems imperative. Recently 15 
artificial intelligent (AI) techniques have been applied to 16 
estimate/predict the discharge(Kisi and Cobaner 2009). 17 

These AI techniques are simple, robust and can handle 18 
complex non-linear processes with ease. From the 19 
literature, it is seen that the AI techniques such as gene 20 
expression programming (GEP), support vector machines 21 
(SVM), etc. were used to predict the discharge)Wang et al. 22 
2008). As they are fully non-parametric, AI techniques 23 
have a major advantage that they do not require a priori 24 


concept of the relations between the input variables and 25 
output data (Bhagwat and Maity 2012). A classical feature 26 
of AI is that the models that are able to analyze the 27 
stochasticity, dynamicity, patterns and attributes in the 28 
input variables used to simulate the evaporation data, and 29 
so, are considered more feasible over the other methods of 30 
the estimating of discharge data (e.g. experimental 31 
approaches and physically-based models). 32 

Examples using the SVM capability include: Stage- 33 
discharge modeling (Barzegar et al 2019;Sahoo et al 2019; 34 
Elkiran et al 2019; Rezaei et al 2019; Adnan et al 2019; 35 
Fathian et al 2019; Yassen et al 2018; Imani et al 2018; 36 
Tongal et al 2018; Ghorbani et al 2016;Londhe and 37 
Gavraskar 2018;Ghazvinei et al 2017; Karahan et al2014; 38 
He et al 2014). 39 

In a research. Presented appropriate method for 40 
seasonal flow discharge and horary used by SVM, in the 41 
research using the amount of snow equivalent water and 42 
the volume of the previous periods, forecasted amount 43 
volume flow for the six-month time scales and 24-hour 44 
than the result showed satisfactory model (Asefa et 45 
al.2005). Using by genetic programming were modeled the 46 
process rainfall-runoff with daily data in two fairly big 47 
China basin that results of GP showed good agreement 48 
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with real data (Jayawardenaet al.2005). In this paper, the 
support vector machine (SVM) is presented as a promising 
method for hydrological prediction. Through the 
comparison of its performance with those of the ARMA 
and ANN models, it is demonstrated that SVM is a very 
potential candidate for the prediction of long-term 
discharges(Lin et al,2006). Also in order to forecasts daily 
discharge flow Shevell river in America used of genetic 
programming and artificial neural network and showed 
both methods had acceptable results but GP has relatively 
higher precision than artificial neural network 
(Guven.2009). Support Vector Machine (SVM) is used to 
forecast daily river flow and the results of these models are 
compared with observed daily values. The results showed a 
good performance in network support vector machine is 
estimating the daily discharge(Moharrampour et al.2012). 

In total, according to the researches done and the fact that 
the river Kakareza is one of the most important rivers in 
Lorestan province and the most important source of water 
supply to different parts of its neighboring areas, which 
over the past decades has reduced the flow rate of the river 
in the basin, which can be explained by lower river basin 
fluxes and surface flows. Therefore, the importance of 
river discharge modeling and management measures to 
improve its water quality is more than necessary. 
Therefore, the aim of this study was to estimate the 
discharge of Kakareza River using a support vector 
machine based on the use of the principle of inductive 


1 minimization of stmctural error. In simulation, the learning 

2 method with monitoring in radial base functions makes 

3 estimating the parameter of high speed and error Less than 

4 other kernel functions.( Vapnik,1995;Vapnik,1998). 

5 

6 

7 H. MATERIALS AND METHODS 

8 Case study and used data 

° Study area is kakareza river in the province of Lorestan, 
Iran, this river is one of permanent rivers in the province 

11 and is originated from southeastern mountains of aleshtar 

12 and biranshahr (dehno). When this river passes through 
1^ aleshtar suburbs it is known as kakareza. The river is 
I 4 between "15 ° 48 ° 49 ° longitude to the" 22 ° 32 to "52 ° 

13 33 degrees latitude and it flows across the east of 
13 Khorramabad (capital city of Lorestan Province). This 

17 river is one of initial branches of karkhe river in zagros 

18 mountains and have the average altitude of 1550 meters 

19 above sea level, kakareza river basin area is about 1148 

20 square kilometers and its river has a length of 85 km. 

21 kakareza river joins Kashkan, Cimmeria, and Karkhe rivers 

22 in its way and eventually pours into the Persian Gulf. The 

23 geographical location of the study area is shown in Figure 

24 1. In this study, available runoff data at monthly scale of 

25 horod station (kakareza) from 1979 to 2015 in Lorestan 

26 Regional Water was used. Table 1, the statistical properties 

27 of kakareza river is shown during the mentioned period. 

28 
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Fig 1. Geographical location kakareza river 
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Tablel. Statistical properties discharge parameter daily discharge (1979-2015) 


Parameter 


Training 



Testing 


Minimum 

Mean 

Maximum 

Minimum 

Mean 

Maximum 

Q 

0.01 

2.718464 

25.15 

0.05 

1.701161 

21.69 


One of the most important steps in modeling, is select the right combination of input variables. Also shown in Table 
2 .The structure of input combinations. 

Table2.The structure of input combinations 


Structure 

Input 

Output 

1 

Q(t-l) 

Q(t) 

2 

Q(t-l)Q(t-2) 

Q(t) 

3 

Q(t-1 )Q(t-2)Q(t-3) 

Q(t) 

4 

Q(t-1 )Q(t-2)Q(t-3)Q(t-4) 

Q(t) 


In this Table Q(t-4), Q(t-3), Q(t-2), and Q(t-l) are 
respectively discharge in t-4, t-3, t-2, and t-1 time as input 
and Q(t) is discharge in t time as output being considered. 
Due to the significant cross-correlation between input and 
output data, in order to achieve an optimal model to 
estimate the inflow to kakareza river use of different 
combinations of input parameters that showed them in 


7 Table3. To estimate input discharge kakareza river using 

8 by Gene Expression Programming and Support Vector 

9 Machine with have catchment hydrometric data from 432 

10 registered records during the period (1979-2015), count in 

11 345 records to training and 87 remaining records to 

12 verification. 

13 


Table 3. Correlation between input and output parameters 



Q(t-l) 

Q(t-2) 

Q(t-3) 

Q(t-4) 

Q(t) 

0.980 

0.964 

0.928 

0.784 


Gene Expression Programming 

Gene Expression Programming method presented with 
Ferreira in 1999 (Ferreira.2001). This method is a 
combination of genetic algorithms (GA) and genetic 
programming (GP) method than in this, simple linear 
chromosomes of fixed length are similar to what is used in 
genetic algorithm and branched structures with different 
sizes and shapes aresimilar to the decomposition of trees in 
genetic programming.Since this method all branch 
structures of different shapes and size are encoded in linear 
chromosome with fixed length, this is equivalent than 
Phenotype and Genotype are separated from each other and 
system could use all evolutionary advantagesbecause of 
their. Now.however the Phenotype in GEP included branch 
structures used in GP, but the branch structures be 
inferences by GEP (than also calledtreestatement) are 
explainer all independent genomes. In short can say 
improvements happened in linear structure then is 
expressed similar with tree stmcture and this causes only 
the modified genomemoved to the Next Generation and 
don't need with heavy structure to reproduce and mutation 


22 (Ferreira.2001). In this method different phenomena are 

23 modeling by collection of functions and terminals. 

24 Collection of functions generally include the main 

25 functions of arithmetic {+, -, x, /}, the trigonometric 
2 g functions or any other mathematical function {V, x 2 , sin, 
27 cos, log, exp, ...} or defined functions by author whom 
2 g believed they are appropriate for interpreting model. 
2 g Collection of terminals consist problem's constants values 
3 q and independent variables (2001). For applying gene 
32 expression programming method is used GenXproTools 

32 4.0 Software. In order to obtain more information can 

33 recourse to (Ghorbaniet al.2012). 

34 Support Vector Machine 

35 

Support Vector Machine is anefficient learning system 

based on optimization theory that used the principle of 

37 

induction minimization Structural error and results an 

38 

overall optimal solution(Vapnik,1998). In regression 

39 

model SVM is estimated function associated with the 

40 dependent variable Y as if is afunction of several 

41 

independent variables X(Xu et al.2007).Like other 

42 
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regression problems is assumed the relationship between 1 
the dependent and independent variables to be determined 2 
with algebraic function similar f(x) plus some allowable 3 
error (e). 4 


f(x)=W T .0(x)+b 

(1) 

y=f(x)+noise 

(2) 


relations.Best values for these four criterions are 44 
respectively 1,0, 1, and 0. 45 

CC= f £w( x '- x )( y '- y ) -1< R <1 (10) 46 

JzU*i-*) 2 zUyryf 

RMSE (ID 47 


If W is coefficients vector, b is constant characteristic 7 
of regression function, and also 0 is kernel function, then 8 
goal is to find a functional form for f(x). It is realized with 9 
SVM model training by collection of samples (train 10 
collection). To calculate w and b require to be optimized 11 
error function in e-SVM with considering the conditions 12 


embodied in Equation 4(Shin et al.2005). 13 

W T . 0(X i )+b-y i < e+ e* , \ W T . W + C 2{Li £ i + 14 

cEfii^r o) 15 

y ; — W T . 0 (X[) — b <£+£[,£[, £; > 0 , i = 16 

1,2, ...,N (4) 17 


In the above equations, C is integer and positive, that 18 
it’s factor of penalty determinant when an error occurs. 0 is 19 
kernel function, N is number of samples and two 20 
characteristics £ ; and £* are shortage variables. Finally can 21 
rewrite SVM function as follow(Shin et al,2005): 22 

f(x)= S^a I 0(x i ) T .0(x)+b (5) 23 

Average Lagrange Coefficients cq in characterized 24 
space is 0(x).Maybe calculation be very complex. To 25 
solve this problem, the usual process of SVM model is 26 
choose a kernel function as follow relation. 27 

K(Xj ,X)=0(X ; ) T J b 2 -4ac (6) 28 

Can be used of different kernel functions to create 29 
different types of £-SVM. Various kernel functions used in 30 
SVM regression models are: Polynomial with three 31 
Characteristics of the target. Radial Basis Functions (RBF) 32 
with one Characteristics of the target, and Linear 33 
respectively, are calculated as follows 34 


relation! V apnik. 1998). 


35 

k( x i,x j )=(x i .x j ) d 

(7) 

36 

K(x,Xi) exp N 2 J ^ 

(8) 

37 

k(x i ,Xj)=x i .Xj 

(9) 

38 

Evaluation Criteria 


39 

In this research to evaluate the accuracy and efficiency 

40 


of the models was used indices Correlation Coefficient 41 
(CC), Root Mean Square Error (RMSE), Nash-Sutcliffe 42 
coefficient (NS), and Bias according to the following 43 


NS=l-^4 -co< NS <1 (12) 48 

XSi(xi-y) " " 

In the above relations X; and y, are respectively 49 
observed and calculated values in time step i, N is number 50 
of time steps, x and y are respectively mean observed and 51 
calculated values. 52 

53 

III. RESULTS AND DISCUSSION 54 

The general purpose of intelligent models is to express the 55 
relation between variables that find their complexity 56 
difficult in the nature of work with high uncertainty. Daily 57 
stream flow is one of the important hydrological 58 
parameters that is of great importance in future steps. In 59 
order to reduce the error and also to estimate the daily flow 60 
rate parameter with high accuracy using the lowest input 61 
parameters, this method has been used which will provide 62 
a better performance compared to approximate methods. 63 
The aim of this study is to obtain this natural complexity 64 
between hydrological parameters and provide a model for 65 
prediction in the future, because daily discharge is more 66 
important than other parameters, so this parameter is 67 
selected as the target variable. 68 

The results of Gene Expression Programming 69 

Using gene expression programmingdue to the 70 
selection of variables in the model and remove variables 71 
with less impact and also ability to provide a clear 72 
relationship were considered to estimating inflow to the 73 
kakareza river. Since ever four input areincorporated to 74 
determining the significant variables and more reviews in 75 
addition four of the original operator (FI) and the states 76 
based on arithmetic operators default (F2). The reason for 77 
choice this type of operator has been based on studies 78 
(Ghorbaniet al.2012) and (Khatibi et al.2012). 79 

Fl:{+, —,*,/, V, Exp, Ln, 2 , 3 ,\f, Sin, Cos, Atan) (13) 80 

F2:{+, —,*,/}( 14) 81 

Results of gene expression programming model for 82 
both operator in Table4 show that F2 operator in both 83 
stages training and verification with maximum correlation 84 
coefficient R=0.88, root mean square error RMSE=0.15 85 
and NS=0.76 has high accurate than other operators. 86 
Therefore gene expression programming with F2 operator 87 
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include four the main mathematical operators with a simple 
mathematical relationship has the most accurate to 
estimating inflow to the kakareza river. The scatter plots of 
gene expression programming related to the verification 
stage in Fig(2-b) showthe fit line of computational values 
with four mathematical operators to the best fit line y=x.As 
is clear from this Fig, all of the estimated and observation 
values are in the fit line except few points that are not 
inbisector line which it isdenoted the estimated and 
observed values of equality on the line (y=x). The 
operation of gene expression programming is acceptable to 
estimating inflow, it should be noted this model worked 
fine, meanwhile these values estimate equal to actual 
values. 


1 These results are consistent with Kisi and Shiri (2012) 

2 research. And it can be stated that the equation obtained 

3 from gene expression planning is obtained from the 

4 random combination of the sum of the terminals and 

5 functions. Therefore, if the relationship between inputs and 

6 outputs is linear, but the operators sin, cos, etc. are selected 

7 in the set of functions, the gene expression planning uses 

8 the selective operators to extract the relationship, which 

9 reduces the accuracy of the model. In this study, to 

10 increase the precision of the model of the operators’ sin, 

11 cos, and so on, and with accuracy and simplicity, the 

12 model derived from four basic mathematical operations 

13 was proposed to estimate sediment load. 

14 


Table 4. The results of the planning model of gene expression programming using tw’o sets of selected mathematical 

operator 





Training 



Testing 


Number 

Model 

R 

RMSE 

(m 3 /s) 

NS 

R 

RMSE 

(m 3 /s) 

NS 

1 

FI 

0.70 

0.31 

0.63 

0.76 

0.25 

0.64 

F2 

0.73 

0.32 

0.64 

0.78 

0.23 

0.66 


FI 

0.75 

0.38 

0.68 

0.80 

0.22 

0.68 

2 


F2 

0.76 

0.34 

0.69 

0.80 

0.21 

0.71 


FI 

0.79 

0.26 

0.71 

0.82 

0.19 

0.72 


F2 

0.80 

0.21 

0.73 

0.84 

0.19 

0.73 


FI 

0.80 

0.19 

0.73 

0.87 

0.15 

0.76 

4 


F2 

0.82 

0.15 

0.75 

0.88 

0.15 

0.76 
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Fig 2. The resulting chart of optimal values of gene expression programming model to the data step verification, a) 
Computational and observational values of time, b) The scatter plot between estimated and observed values 
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The results support vector machine 1 

In order to estimate the inflow to the kakareza river by 2 
SVM model can examine types of kernel function, than 3 
was selected linear kernel, polynomial and radial basis 4 
functions that are common types used in hydrology. The 5 
results of study models is given in Table5. According to 6 
this table combined model number4 with radial basis 7 
functions kernel has the highest correlation coefficient 8 
R=0.97, lowest root mean square error RMSE=0.08 m 3 /s 9 
and NS=0.94 in verification stage that has optimal solution 10 
than other models. In Fig3 shown the best model for 11 
verification of data. 12 

As shown in Fig(3-b) is clear computational values 13 
discharge of the support vector machine model verification 14 
corresponded with observed values. In this Fig can be seen 15 
insignificant difference some of values with the best fit line 16 
y=x. According to the diagram (3-a) can be seen high 17 
capability of the model. Also, according to Table 5, a high 18 


performance support vector machine has been shown in the 
Kakareza River discharge estimation, even if only one 
input parameter is used, which leads to the presence of 
statistical deficiencies in this network with Having the 
minimum input parameters, such as flow rate, one day 
before, would have acceptable performance in flow rate 
forecasting. In Fig. 3, changes in computational and 
observational values of time are shown, it is seen that this 
model was in the estimation of most of the values of 
acceptable accuracy in such a way that these estimates are 
close to their actual value. The results are consistent with 
the research by Buyukyildiz and Kumcu (2017) and 
Nourani et al (2015). This can be explained by the fact that 
the backup machine is based on the use of the principle of 
inductive minimization of structural error. Therefore, in 
simulation, using a learning method with monitoring in 
radial base functions, the prediction of the parameter has a 
higher velocity and less error than other kernel functions, 
and this is a privilege of radial base functions. 


Table 5. Results of the three kernel methods used in Support Vector Machine for training and verification data 


Number 



Training 



Testing 


Kernel 

R 

RMSE 

(m 3 /s) 

NS 

R 

RMSE 

(m 3 /s) 

NS 


RBF 

0.87 

0.13 

0.76 

0.90 

0.16 

0.88 

1 

Poly 

0.74 

0.19 

0.67 

0.79 

0.17 

0.80 


Fine 

0.64 

0.24 

0.54 

0.71 

0.29 

0.69 


RBF 

0.89 

0.12 

0.80 

0.93 

0.11 

0.90 

2 

Poly 

0.76 

0.17 

0.69 

0.81 

0.16 

0.82 


Fine 

0.67 

0.22 

0.58 

0.75 

0.27 

0.71 


RBF 

0.90 

0.11 

0.81 

0.94 

0.10 

0.92 

3 

Poly 

0.79 

0.15 

0.75 

0.81 

0.14 

0.84 


Fine 

0.69 

0.18 

0.62 

0.79 

0.27 

0.72 


RBF 

0.91 

0.09 

0.82 

0.97 

0.08 

0.94 

4 

Poly 

0.81 

0.14 

0.75 

0.84 

0.13 

0.87 


Fine 

0.80 

0.18 

0.66 

0.80 

0.26 

0.73 
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Table 6. The final results of the training and verification gene expression programming and support vector machine 16 




Training 



Testing 


Model 


RMSE 



RMSE 



R 

(m3/s) 

NS 

R 

(m3/s) 

NS 

S.V.M 

0.91 

0.09 

0.82 

0.97 

0.08 

0.94 

GEP 

0.82 

0.15 

0.75 

0.88 

0.15 

0.76 



1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 


time (month) 

Fig 4. The scatter plot between estimated and observed values gene expression programming and support vector machine 

models for recorded data in verification stage 
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Time (month) 

Fig 5. All two models graph optimization error as a percentage of the mean observed value 


Finally difference between the observed inflow values 
and optimal computational models calculated as a 
percentage of the mean observed values (error value) and 
was drawn this diagram in comparison with the data 
recorded (Fig5). As seen in this Fig, more errors to ever 
three models has been ±5 band the highest error rate gene 
expression programming and support vector machine 
models are respectively 6.61 and3.10 percent of the mean 
observed values. Among these models (GEP and SVM) 
svm model has lowest error value. Totally due to the high 
estimation accuracy and reliability gene expression 
programming and support vector machine models the 
correlation between the observed values and the computed 
values are respectively 0.970 and 0.880. Also the results of 
was significant estimated and observed values in the 
probability levels %5 and %10 shown, SVM model has 
significant correlation in both probability levels. 

IV. CONCLUSIONS 

In this research, we tried to evaluated performance 
some models to simulating discharge to the kakareza river 
In the province lorestan using by discharge month data in 
kakareza river. Used models include gene expression 
programming and support vector machine models. 
Observed inflow values compared with estimated inflow in 
these models (GEP and SVM). The results summarized as 
follows: 

A: SVM model has high accurate and a little error to 
estimate minimum, maximum, middle values and peak 
discharge, and high correlation with the observed value. B: 
Gene expression programming model with the four basic 
arithmetic operations has high ability to estimating 


2 

minimum, maximum, and middle values and peak values, 35 
also support vector machine with radial basis functions 36 
kernel has high ability estimating minimum and middle 37 
values but to estimating maximum values doesn't have 38 
enough operation. C: Increasing the number of parameters 39 
in the various models to simulating inflow cause to 40 
improve operation to estimating inflow. D: Estimating 41 
inflow using by combined models have lower error and 42 
high correlation than other models to estimated inflow in 43 
reservoirs dam. 44 

Totally the results of this research showed support 45 
vector machine method has highest accurate than other 46 
models. As research results (Ghorbaniet 47 
al.2016),(Moharrampour et al.2012) and (Asefa et al.2005) 48 
has been proven its. Also this research shown using of 49 
gene expression programming and support vector machine 50 
models could use to estimating inflow to the river. 51 
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